Archive Magazine CD 1995

home *** CD-ROM | disk | FTP | other *** search

/ Archive Magazine CD 1995 / Archive Magazine CD 1995.iso / discs / prog_disc / volume_6 / issue_12 / pipelinez / ReadMe < prev next >

Wrap

Text File | 1993-07-25 | 10.0 KB | 224 lines

%OP%VS4.13 (28-Apr-92), Gerald L Fitton, R4000 5966 9904 9938 %OP%DP0 %OP%IRY %OP%PL0 %OP%HM0 %OP%FM0 %OP%BM0 %OP%LM4 %OP%PT1 %OP%PDPipeLine %OP%WC1026,2262,648,1748,0,0,0,0 %CO:A,72,72% %C%Curve Fitting %C%by Gerald L Fitton Keywords: Best Fit Fitton Introduction For over a year now I have been asked to explain how to use the PipeDream functions linest(y,x) and trend({c,m},x). I kept promising myself that I ought to get around to it but it has taken a problem sent to me by John Nottage to finally make my mind up that now the time has surely come. I have decided that, in addition to describing the 'line of best fit' functions linest(y,x) and the trend({c,m},x) function, I could use the opportunity to show you how using Arrays and Names often make formulae much more usable and readable. Best Straight line Let's start by creating the spreadsheet shown in the screen shot, figure 1, below. The formula for a straight line is y = m*x + c. The parameter c is called 'the constant term' and m is called 'the gradient'. In the range A12A17 I have inserted half a dozen values of x and, in the range B12B17, the corresponding values of y have been calculated by putting c and m (from B6 and B7) into the formula y = m*x + c. This formula is used in many engineering and financial applications. An example is "total cost = marginal cost of a unit * number of units + fixed cost". The "marginal cost of a unit" is the gradient, m, and in this example, it is the extra cost of producing one more unit. The "fixed cost", c, is the cost you incur just getting ready to make some units (such as machinery) but it doesn't include the "marginal costs" such as raw material and labour. Names In slot B12 I could have typed B$7*A12+B$6 and then replicated it down through the range B12B17 - but I haven't done that. In my version slot B12 contains the formula set_value(y,m*x+c). I suggest that my version makes the spreadsheet much more readable but at the expense of a little more effort on the part of the writer of the application. It may help you follow the next paragraph if you refer to the screen shot, figure 2 below. The four letters within the set_value(y,m*x+c) function, y, m, x and c, are all PipeDream Names. If, in the final version of this speadsheet, you were to place the pointer over the italic f (just to the left of the tick and cross and to the right of the PipeDream logo - near the formula line) and click the mouse select (left) button then a sub menu called 'functions' will open and, at the bottom of the list, you will see 'Edit name'. If you were to run the pointer through the arrow to the right of 'Edit name' you would see that there are seven names defined. These are x, y, y_est, m, c, c_est, m_est. You could run the pointer through, say, c, and you would see that the definition of the name c is the content of the slot B6. Similarly you could establish that the name x is the range of slots A12A17. This range contains the values of x! To define a name such as x as the range of slots A12A17 you click on the italic f, run the pointer through 'Define name', type x in the dialogue box alongside 'Name:', you type A12A17 into the dialogue box alongside 'Refers to:' and finally click on the OK box. The names I have defined are: x as A12A17, y as B12B17, y_est as D12D17, m as B7, c as B6, c_est as D6 and m_est as D7. The set_value(,) function The function set_value(,) is a simple way of evaluating a function many times and 'poking' the answers into a range of slots. It is particularly simple when used with names. Yes, I know it takes time to define the names, but, for large ranges particularly, it is worth it. The function in slot B12 is set_value(y,m*x+c). Having defined the names as slots or ranges of slots set_value(y,m*x+c) calculates all the y values from m*x+c and 'pokes' the answers into the range of slots defined as y. What is 'clever' about PipeDream is that each value in the y range is individually calculated using its corresponding x and the result 'poked' into the 'correct' y slot. The set_value(,) formula is written only once and does not have to be replicated (or copied) down the range of slots. Using names in this way reduces the size of the PipeDream file in memory (the function appears in only one slot) and on the disc. Because the file is smaller the loading time and the time to recalculate is also reduced. I recommend using names and set_value(,) to you as a way of reducing memory usage, disc space and speeding up loading, saving and recalculation. The linest(y,x) function In slot D6 I have entered the formula index(linest(y,x),1,1). Let's deal with linest(y,x) first. The function linest(y,x) takes as its arguments only two ranges of values, in our case y and x. Please note that the y range is the first argument. In our simple example x is a single column of values, a more complex version (which we'll deal with another day) uses one y range and a many column x range! When used in this simple way linest(y,x) returns an array of two numbers in a single slot. The two numbers are the row array {c_est,m_est} where c_est and m_est are the 'constant term' and the 'gradient' of a straight line which is the 'best' straight line for the set of points. Strictly it is the 'least squares regression line of y on x'! On page 250 of the First Edition of the PipeDream 4 Reference Guide the function linest(,,,,) is shown with five arguments. All too often I have been asked how to use the last three arguments. The answer is that you can't because, in PipeDream version 4.13 they don't exist and, as far as I know, they are never likely to. The function linest(y,x) can take only two arguments but the second argument, x, can consist of many columns. Let's deal with what is called 'multivariate regression' another day. For now let it suffice that the arguments stats, constant_parameters and measured_errors of the First Edition do not and will not exist. The index(array,col,row) function To 'split up' the array returned by linest(y,x) into its two separate terms, {c_est,m_est}, we need to use the index(array,col,row) function. The index function I have entered into slot D6 is index(linest(y,x),1,1). Since linest(y,x) = {c_est,m_est}, array, index(linest(y,x),1,1) = (index({c_est,m_est},1,1). The array {c_est,m_est} has only one row and so the last (third) argument of index({c_est,m_est},1,1) must be 1. The second argument, 1, selects the first element of the array which is c_est. Hence index(linest(y,x),1,1) = c_est. The value in D6 is our best estimate of the constant term c_est for the 'least squares regression line of y on x'! Slot D7 contains index(linest(y,x),2,1). The only difference is the second argument, 2, which selects the gradient, m_est. The value in D7 is our best estimate of the gradient, m_est. A point you should realise here is that I've 'cheated' by using 'exact' values of y calculated from the formula y = m*x + c so that, if I have the 'right' formulae in column D, then my 'estimated' values for c_est and m_est are bound to be 'exact' and equal to the values of c and m in B6 and B7! The trend({c,m},x) function From the screen shot, figure 1, you will see that slot D12 contains the formula set_value(y_est,trend({c_est,m_est},x)) - and we've dealt with set_value(,). Note that y_est is the name of the range D12D17, so let's have a look at trend({c_est,m_est},x). The function trend({c_est,m_est},x) returns a column array when x is a column (it is in this case) or a row array when x is a row. The column array returned by trend({c_est,m_est},x) is exactly the same array as that given by the formula {m_est*x + c_est}. We saw how {m*x + c} returned an array when we looked at slot B12. I think that, when I know the values of c and m (separately), then I would prefer to enter m_est*x + c_est (without the curly brackets) instead of trend({c_est,m_est},x); it's shorter and it does exactly the same thing. On another occasion when we have a look at the function logest(y,x) to produce log linear and log log graphs we shall see that this second approach has its advantages. The function trend(,) has the advantage if you don't want to use the values of {c,m} separately. In those cases you can write trend(linest(y,x),x); and we'll find that we can't do that with log linear and log log curve fitting. To summarise where we are up to now. In the spreadsheet of figure 1 there are only five slots containing formula. These are: B12 = set_value(y, m*x+c) C12 = set_value(C12C17,x) D12 = set_value(y_est,trend({c_est,m_est},x)) D6 = index(linest(y,x),1,1) D7 = index(linest(y,x),2,1) There are seven PipeDream Names which have been defined. These are: x = A12A17 y = b12B17 y_est = D12D17 m = B7 c = B6 m_est = D7 c_est = D6 You can alter the values in B6 and B7 and in the range A12A17 but, for now, you should change nothing else. Best Straight line Up to now the y values in B12B17 have been generated from a formula which uses the values c and m in B6 and B7. Place the cursor in slot B12, delete the formula and replace it with the number 1. Nothing much should happen but the values of y are now no longer calculated from the formula using c and m from B6 and B7. Enter a few y values which are different from but approximately the same as those generated by the formula and, when the sheet has recalculated, you will find that y and y_est do not quite match. If you plot the values of y against x you will no longer have a straight line (unless you're lucky) but plotting y_est against x will give you a perfect straight line, the 'best' straight line for the x and y values. Don't save the modified sheet unless you change the name! Charting To help you visualise what is going on you might like to try creating a chart with two sets of data. The first set should use the x and y values in A12B17 and you can plot it in red. The second set should use the data in C12D17 and you can plot it in blue. You won't see the blue points until you modify the y data as described in the previous paragraph.